Approach to Study pH-Dependent Protein Association Using Constant-pH Molecular Dynamics: Application to the Dimerization of β-Lactoglobulin

Protein–protein association is often mediated by electrostatic interactions and modulated by pH. However, experimental and computational studies have often overlooked the effect of association on the protonation state of the protein. In this work, we present a methodological approach based on constant-pH molecular dynamics (MD), which aims to provide a detailed description of a pH-dependent protein–protein association, and apply it to the dimerization of β-lactoglobulin (BLG). A selection of analyses is performed using the data generated by constant-pH MD simulations of monomeric and dimeric forms of bovine BLG, in the pH range 3–8. First, we estimate free energies of dimerization using a computationally inexpensive approach based on the Wyman–Tanford linkage theory, calculated in a new way through the use of thermodynamically based splines. The individual free energy contribution of each titratable site is also calculated, allowing for identification of relevant residues. Second, the correlations between the proton occupancies of pairs of sites are calculated (using the Pearson coefficient), and extensive networks of correlated sites are observed at acidic pH values, sometimes involving distant pairs. In general, strongly correlated sites are also slow proton exchangers and contribute significantly to the pH-dependency of the dimerization free energy. Third, we use ionic density as a fingerprint of protein charge distribution and observe electrostatic complementarity between the monomer faces that form the dimer interface, more markedly at the isoionic point (where maximum dimerization occurs) than at other pH values, which might contribute to guide the association. Finally, the pH-dependent dimerization modes are inspected using PCA, among other analyses, and two states are identified: a relaxed state at pH 4–8 (with the typical alignment of the crystallographic structure) and a compact state at pH 3–4 (with a tighter association and rotated alignment). This work shows that an approach based on constant-pH MD simulations can produce rich detailed pictures of pH-dependent protein associations, as illustrated for BLG dimerization.


This ensures that
An analytic expression for the spline integral can be obtained by summing the contributions from each interval. Thus, the integral of f (x) from the initial point x 1 to a point x ∈ [x k , x k+1 ] can be written as is the interval-specific integral of h i between x i and an arbitrary value The integral of f (x) is then a piecewise quartic polynomial.

S-7
As discussed in section 2.4 of the main article, we are here interested in the case where the x i are the simulated pH values, the y i are the total or individual average proton occupancies, and the y i are the slopes obtained from equation 6 or 7. S-8

Bootstrap Errors
As discussed in section 2 of the main manuscript, r simulation replicates are run at each pH (r = 4 for the monomer and r = 8 for the dimer), each of which produces a time series of coordinates and protonation states. These r time series are then collected into a single set that is used to compute the intended quantities.
The statistical errors of all protonation-derived quantities were computed using a bootstrap method S7 performed over replicates, which avoids the need to account for the temporal correlations present within each replicate-specific time series. Furthermore, since some quantities are explicitly derived from the pH dependency (e.g., the pK a of a site is derived by fitting a Hill curve to its pH-dependent average protonation), each resampled data set is generated in way that includes that pH dependency. More exactly, each resampled data set is obtained by randomly selecting r new replicates from the original ones, at each of the six simulated pH values (which selects one data set from among the 2r−1 r 6 different ones corresponding to this combinatorial procedure). This was repeated 1000 times for the monomer and for the dimer simulations, and the resulting resampled data sets were used to estimate the statistical errors of the protonation-derived quantities using the usual bootstrap plugin rationale, S7 as explained below in more detail.
To estimate the error of the pK a and Hill coefficient (h) of each site, shown in Table 1, a Hill curve was fitted to the pH-dependent average protonation obtained from each of the 1000 resampled data sets. The standard error of each parameter, pK a and h, was then computed as the standard deviation of the set of 1000 parameter values thus produced. S7 Error bounds for each ∆∆G i curve were computed from the resampled data sets as follows: 1. A spline curve interpolating the average protonations was derived from each of the 1000 resampled data sets, as described in section 2.4 of the paper.
2. Each of the resulting 1000 monomer/dimer curve pairs was then used to compute a S-9 continuous ∆∆G i curve using eq. 5.
3. Each of these 1000 ∆∆G i curves was vertically least-square fitted to the original (nonresampled) curve.
4. At finely-spaced pH values a standard deviation was computed from the 1000 corresponding ∆∆G i values, producing an essentially continuous standard error envelope for the original (non-resampled) curve, as shown in Figure 5.
The vertical fitting follows from the fact that, as noted in section 2.4 of the paper, each integration produces only a curve shape; for example, choosing the same arbitrary pH ref for each of the 1000 resamples would give ∆∆G i (pH ref ) = 0 for all of them and, consequently, an error of zero at that arbitrary pH value, which is absurd. The procedure just described was also used to compute the standard error envelope of the global ∆∆G curve, shown in For each titratable site, the error of its average protonation at each simulated pH, shown in the plots of Figure S9, was obtained by computing a 68% confidence interval from the 1000 average protonations computed from the resampled data sets, using the percentile method. S7 This gives an interval whose magnitude should be roughly similar to the symmetric one of plus/minus one standard error, but avoids error bars that go below zero or above one.
For the correlation between a pair of sites, the standard error at each simulated pH value, shown in the plots of Figure Figure S3: Simulation time series of the secondary structure fraction. The secondary structure of the protein was assigned using the DSSP criterion defined by Kabsch and Sander. S8 The different types of helix are classified as a single helix content and the β-bridge and β-sheet are reported as a single β-structure content. Replicates 1-4 were started from an open conformation and 5-8 from a closed one.  COMs distance (nm) RMSD (nm) rep1 rep2 rep3 rep4 rep5 rep6 rep7 rep8 Figure S8: Identification of dissociated structures in the dimer simulations. The vertical axis corresponds to the distance between the centers of mass of the two dimer partners, and the horizontal axis to the RMSD of the dimer backbone relative to the initial dimer structure. The structures with a distance higher than 3.3 nm or an RMSD higher than 1.2 nm were considered dissociated and excluded from the analyses.  Table S3: Experimental BLG dimerization free energies used in Figure 4 of the main manuscript. The dimerization free energies were either taken directly from the original publication or computed from the reported association or dissociation constants (in which case the errors were estimated using the absolute differential formula S9 ). All free energies refer to a thermodynamic standard state of unit molarity. S10 Mercadante et al. report that at pH 4.5 and 5.5 the dissociation constant has submicromolar values (< 1 µM) which correspond to ∆G < −8.19 kcal/mol. In all cases the ionic strength was 0.1 M. Figure S10: Networks of the protonation correlations in BLG monomer that are above 0.15 in absolute value. At pH 8, no correlations above this cutoff are observed. Red/blue indicate negative/positive correlations and more intense colors correspond to higher absolute values. The strongest negative correlation is −0.58 (at pH 3) and the strongest positive correlation is 0.35 (at pH 6). Networks were drawn with the Graphviz software package (https://graphviz.org).

Dimerization Free Energy
S-52  Figure S11: Networks of the protonation correlations in BLG dimer that are above 0.15 in absolute value. At pH 8, no correlations above this cutoff are observed. Red/blue indicate negative/positive correlations and more intense colors correspond to higher absolute values. The strongest negative correlation is −0.43 (at pH 3) and the strongest positive correlation is 0.45 (at pH 6). Different rectangle colors represent different dimer partners. Networks were drawn with the Graphviz software package (https://graphviz.org). Figure S15: Density contours of 150 mM for the Na + (cyan) and Cl -(yellow) ions in the monomer simulations, oriented to show the monomer face that would be found at the interface of the dimer, with the plane between both partners represented as a grid. S-61

Dimer Configurations
Figure S16: Dihedral angles between the helices (top) and β-strands (bottom) at the interface, whose average values as a function of pH are shown in Figure 12 in the main article. The dihedral between helices is defined with the C atoms of Ala132, Leu140, Leu302 and Ala294, in this order. The dihedral between β-strands is defined with atoms N of Ile147, C of Ser150, C of Ser312 and N of Ile309, in this order.  S-65